Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview
نویسندگان
چکیده
Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources their data jobs. This requires significant insights into expected job runtimes scaling behavior, resource characteristics, input distributions, other factors. Unable estimate performance accurately, frequently overprovision jobs, leading low utilization high costs. In this paper, we present major building blocks towards a collaborative approach optimization cluster configurations based on runtime models. We believe that can be shared used models across different execution contexts, significantly reducing the reliance recurrence individual jobs or, else, dedicated profiling. For this, describe how similarity infrastructures employed combine suitable points from local global executions accurate Furthermore, outline approaches prediction via more context-aware reusable Finally, lay out metrics previous combined with monitoring effectively re-configure dynamically.
منابع مشابه
Parallel and Distributed GIS for Processing Geo-data: An Overview
Geographic Information System (GIS) is a collection of applications whose tasks include (collaborating with other systems and) gathering geographic data, store and process spatio–temporal data (geo-data) and share the derived geographic knowledge with the users and other applications. Some of the most important routine applications of GIS are spatial analysis, digital elevation model (DEM) anal...
متن کاملLow Cost Cluster Architectures for Parallel and Distributed Processing
Cluster based architectures are standing out in the last years as an alternative for the construction of versatile, low cost parallel machines. This versatility permits their use as much as a teaching tool or as a research environment in the field of parallel and distributed processing. This paper describes some of the possibilities found today on the market for the construction of cluster base...
متن کاملQueueLinker: A Framework for Parallel Distributed Processing of Data Streams
With the development of computer systems, many more devices are being connected to the network and generating ‘data stream.’ Analyzing data streams in real-time offers valuable information about human activities and contributes to many information services. QueueLinker enables programmers to build data stream processing applications by implementing application modules that use a producer–consum...
متن کاملDistributed Configuration Management for Reconfigurable Cluster Computing
Cluster computing offers many advantages as a highly cost-effective and often scalable approach for highperformance computing in general, and most recently as a basis for hardware-reconfigurable systems. To achieve the full potential of performance of reconfigurable HPC systems, a runtime configuration service is required. Centralized configuration services are a natural starting point but tend...
متن کاملEliminating Homogeneous Cluster Setup for Efficient Parallel Data Processing
This project proposes to eliminate homogeneous cluster setup in a parallel data processing environment. A homogeneous cluster setup supports static nature of processing which is a huge disadvantage for optimising the response time towards clients. Parallel data processing is performed more often in today's internet and it is very important for the server to deliver the services to its clie...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Datenbank-Spektrum
سال: 2022
ISSN: ['1618-2162', '1610-1995']
DOI: https://doi.org/10.1007/s13222-022-00416-z